21 research outputs found

    Manycore processing of repeated range queries over massive moving objects observations

    Full text link
    The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper we focus on a specific data-intensive problem, concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extents of queries and objects are continuously modified over time. To tackle this problem and significantly accelerate query processing we devise a hybrid CPU/GPU pipeline that compresses data output and save query processing work. The devised system relies on an ad-hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses and favour coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 14x-20x, depending on the datasets, even when considering very cheap GPUs

    Query-level Early Exit for Additive Learning-to-Rank Ensembles

    Get PDF
    Search engine ranking pipelines are commonly based on large ensembles of machine-learned decision trees. The tight constraints on query response time recently motivated researchers to investigate algorithms to make faster the traversal of the additive ensemble or to early terminate the evaluation of documents that are unlikely to be ranked among the top-k. In this paper, we investigate the novel problem of query-level early exiting, aimed at deciding the profitability of early stopping the traversal of the ranking ensemble for all the candidate documents to be scored for a query, by simply returning a ranking based on the additive scores computed by a limited portion of the ensemble. Besides the obvious advantage on query latency and throughput, we address the possible positive impact on ranking effectiveness. To this end, we study the actual contribution of incremental portions of the tree ensemble to the ranking of the top-k documents scored for a given query. Our main finding is that queries exhibit different behaviors as scores are accumulated during the traversal of the ensemble and that query-level early stopping can remarkably improve ranking quality. We present a reproducible and comprehensive experimental evaluation, conducted on two public datasets, showing that query-level early exiting achieves an overall gain of up to 7.5% in terms of NDCG@10 with a speedup of the scoring process of up to 2.2x

    Parallel Traversal of Large Ensembles of Decision Tree

    Get PDF
    Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x. IEE

    Processing streams of spatial k-NN queries and position updates on manycore GPUs

    No full text
    The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. In this paper we focus on a specific data-intensive problem concerning the repeated processing of huge amounts of k nearest neighbours (k-NN) queries over massive sets of moving objects, where the spatial extents of queries and the position of objects are continuously modified over time. In particular, we propose a novel hybrid CPU/GPU pipeline that significantly accelerate query processing thanks to a combination of ad-hoc data structures and non-trivial memory access patterns. To the best of our knowledge this is the first work that exploits GPUs to efficiently solve repeated k-NN queries over massive sets of continuously moving objects, even characterized by highly skewed spatial distributions. In comparison with state-of-the-art sequential CPU-based implementations, our method highlights significant speedups in the order of 10x-20x, depending on the datasets, even when considering cheap GPUs.The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. In this paper we focus on a specific data-intensive problem concerning the repeated processing of huge amounts of k nearest neighbours (k-NN) queries over massive sets of moving objects, where the spatial extents of queries and the position of objects are continuously modified over time. In particular, we propose a novel hybrid CPU/GPU pipeline that significantly accelerate query processing thanks to a combination of ad-hoc data structures and non-trivial memory access patterns.To the best of our knowledge this is the first work that exploits GPUs to efficiently solve repeated k-NN queries over massive sets of continuously moving objects, even characterized by highly skewed spatial distributions. In comparison with state-of-the-art sequential CPU-based implementations, our method highlights significant speedups in the order of 10x-20x, depending on the datasets, even when considering cheap GPUs

    Trade-off Aware Sequenced Routing Queries (or OSR Queries when POIs are not Free)

    No full text
    The well-known Optimal Sequenced Routing (OSR) query considers a traveller that needs to stop by some cost-free points of interest (POIs), each belonging to a given strict sequence of categories of interest (COIs), while minimizing only the distance traveled. In this paper we extend the OSR query by adding the constraint that (1) each POI yields a non-null cost and that (2) the traveller wishes to minimize the travel distance as well as the total cost of POIs he/she stops by. We name this new query as Trade-Off Aware Sequenced Routing (TASeR). The challenging aspect of this query is that it is not always possible to optimize both travel distance and total POI cost simultaneously. As well, combining both criteria into a single one with predetermined weights may not be desirable or even feasible. As our main contribution we make use of the linear skyline paradigm, along with provably correct pruning criteria, to propose an approach that finds all optimal solutions for any linear combination of the two competing criteria very efficiently. Our experiments using real city-scale data show that our proposed approach can obtain optimal linear skyline sets in sub-second processing time for reasonably sized instances of the TASeR query. Moreover, we show that any instance of the traditional OSR query can be easily modeled as a TASeR query, hence, our proposed approach can also solve OSR queries at the expense of negligible overhead

    Towards A Personal Shopper's Dilemma: Time vs Cost

    No full text
    Consider a customer who has a shopping list and a personal shopper who is willing to buy and resell goods in a customer's shopping list. It is in the personal shopper's best interest to find shopping routes that minimize two competing criteria: the time needed to serve a customer and the price paid for the goods. In this short paper we present an efficient solution to this problem based on finding an approximate linear skyline set of such shopping routes. (An extended version of this paper can be found at [1])

    Mining Condensed Spatial Co-Location Patterns

    No full text
    The discovery of co-location patterns among spatial events is an important task in spatial data mining. We introduce a new kind of spatial co-location patterns, named condensed spatial co-location patterns, that can be considered as a lossy compressed representation of all the co-location patterns. Each condensed pattern is the representative, and a superset, of a group of spatial co-location patterns in the full set of patterns such that the difference between the interestingness measure of the representative and the measures of the patterns belonging to the associated group are negligible. Our preliminary experiments show that condensed spatial co-location patterns are less sensitive to parameter changes and more robust in presence of missing data than closed spatial co-location patterns

    Semantic Enrichment of Mobility Data: A Comprehensive Methodology and the MAT-BUILDER System

    No full text
    The widespread adoption of personal location devices, the Internet of Mobile Things, and Location Based Social Networks, enables the collection of vast amounts of movement data. This data often needs to be enriched with a variety of semantic dimensions, or aspects, that provide contextual and heterogeneous information about the surrounding environment, resulting in the creation of multiple aspect trajectories (MATs). Common examples of aspects can be points of interest, user photos, transportation means, weather conditions, social media posts, and many more. However, the literature does not currently provide a consensus on how to semantically enrich mobility data with aspects, particularly in dynamic scenarios where semantic information is extracted from numerous and heterogeneous external data sources. In this work, we aim to address this issue by presenting a comprehensive methodology to facilitate end users in instantiating their semantic enrichment processes of movement data. The methodology is agnostic to semantic aspects and external semantic data sources. The vision behind our methodology rests on three pillars: (1) three design principles which we argue are necessary for designing systems capable of instantiating arbitrary semantic enrichment processes; (2) the MAT-Builder system, which embodies these principles; (3) the use of an RDF knowledge graph-based representation to store MATs datasets, thereby enabling uniform querying and analysis of enriched movement data. We qualitatively evaluate the methodology in two complementary example scenarios, where we show both the potential in generating interesting and useful semantically enriched mobility datasets, and the expressive power in querying the resulting RDF trajectories with SPARQL
    corecore